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The entropy definition is deduced by means of (re)deriving the generalized non-linear Langevin 
equation using Zwanzig projector operator formalism. It is shown to be necessarily related to an 
invariant measure which, in classical mechanics, can always be taken to be the Liouville measure. 
It is not true that one is free to choose a "relevant" probability density independently as is done 
in other flavors of projection operator formalism. This observation induces an entropy expression 
which is valid also outside the thermodynamic limit and in far from equilibrium situations. The 
Zwanzig projection operator formalism therefore gives a deductive derivation of non-equilibrium, and 
equilibrium, thermodynamics. The entropy definition found is closely related to the (generalized) 
microcanonical Boltzmann-Planck definition but with some subtle differences. No "shell thickness" 
arguments are needed, nor desirable, for a rigorous definition. The entropy expression depends on 
the choice of macroscopic variables and does not exactly transform as a scalar quantity. The relation 
with expressions used in the GENERIC formalism are discussed. 
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I. INTRODUCTION 

The classical, Boltzmann-Planck, definition of entropy 
is the logarithm of the number of microstates correspond- 
ing to a macroscopic state of a system (times the Boltz- 
mann constant fcs). Within classical mechanics this defi- 
nition causes some fundamental problems since the num- 
ber of microstates is not countable. The common res- 
olution is to define a unit volume of microscopic phase 
space. Within classical mechanics the motivation why 
this is reasonable is found in Liouville's theorem (incom- 
pressibility of phase space velocity). To quantify what 
the unit of phase space should be one usually resorts to 
quantum mechanics, heuristically, to the uncertainty re- 
lation. A unit volume in microscopic phase space, spec- 
ified by positions and momenta of all N particles, is ac- 
cording to this reasoning proportional to h 3N . In the 
setting of the microcanonical, iso-energy, ensemble this 
does not resolve the issue fully, since the iso-energy sur- 
face has a zero Liouville measure. In this case a finite 
shell thickness is usually assumed. 

Several reasonings are encountered to motivate this 
thickness. The first is to refer, again, to quantum me- 
chanics and the uncertainty relation. A second reasoning 
is that the thickness of the shell region is somehow set by 
uniformity of fluctuations. The validity of both reasoning 
can be debated especially because, within the classical 
setting, in a closed system there are no fluctuations of the 
total energy. A third, well founded, reasoning is that the 
thickness is irrelevant for the case of the thermodynamic 
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limit. In this view entropy is only fully unambiguously 
defined in this limit. Indeed the proof that the extensive 
entropy expression in this limit is independent of the shell 
thickness can be found in classical monographs P, Q • 

In the current paper we will show that the defini- 
tion that follows from the Zwanzig projection-operator 
formalism [3] that leads to a generalized (non-linear) 
Langevin equation gives rise to an entropy definition that 
does need no reference to a shell thickness. Furthermore 
it can be defined for any set of macroscopic variables. In 
the case of non-conserved quantities, the shell argument 
is hard to defend (since an ensemble does not remain 
within such a shell). One does not have to worry about 
that since the non-shell definition is the "real" entropy 
definition. 

The usual reasoning of why entropy is an important 
quantity has to do with ergodicity-like arguments, i.e., 
sampling large parts of microscopic phase space by "tra- 
jectories" . In other words there is a connection to dynam- 
ics. A formal mathematical tool to discuss the issue of 
the connection between microscopic dynamics and ther- 
modynamics is projection operator formalism, 0, Q. It 
is a method for decomposing equations of motion. So, in 
principle it gives a different representation of an already 
known exact equation. Its use is that it may point the 
way toward good modeling assumptions and toward well- 
chosen approximations. When defining macroscopic vari- 
ables to describe a system it can be used to "project" the 
microscopic dynamics onto the macroscopic phase space. 

One might expect that, as a by-product of this proce- 
dure, the entropy definition arises. Usually, this is not 
the way the projection operator is constructed. In the 
conventional practice one is free to choose macroscopic 
variables as well as a "relevant" probability distribution 
[{|. The relevant probability distribution follows from 
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independent, statistical mechanics, reasoning. 

There are many flavors of projection operations 
(Robertson, Mori, Zwanzig, Kawasaki etc.). The reason 
is that one has quite a lot of freedom to define projection 
operators. The common methodology is to make use of a 
Hilbert-space description of a system, e.g., in [f| . One ba- 
sic ingredient of creating such a Hilbert space is defining 
an inner-product. Using this inner-product one can then 
construct projection operators. As an input for defining 
the inner-product one uses the equilibrium distribution 
function of a system. Since the equilibrium distribution 
is a result derived from statistical mechanics, with an 
implicit entropy definition, the hope to find an entropy 
definition from first principles is lost in this case. 

An often used flavor of projection operator formalism, 
originated in the work of Mori Q , results in generalized 
linear Langevin equations. In this linear case, the system 
is characterized by the expectation values (or rather en- 
semble averages) of a set of macroscopic variables. The 
resulting equations arc useful only near thermodynamic 
equilibrium. The reason is that expectation values of 
non-linear functions of the variables (higher moments), 
that are also likely to relax slowly, are not part of the set 
of macroscopic variables. These non-linear functions of 
the macroscopic variables can "hide" in orthogonal sub- 
spaces of the Hilbert space and are therefore not filtered 
out by the projection operator. These slow relaxation 
time scales show up as slowly decaying memory func- 
tions. For useful approximation further from equilibrium, 
using this Mori approach, also higher order moments and 
cross-correlations (mode-couplings) have to be taken into 
account (see [3] for a basic explanation) . 

A second flavor of projection operator formalism is the 
one introduced earlier by Zwanzig Q. In this case the 
system is characterized by a full set of macroscopic vari- 
ables, and not only the expectation values of the first 
moments. This is much more restrictive on the choice of 
(reasonable) projection operators [4pj . It produces a non- 
linear generalized Langevin equation. The reason that, in 
the current paper, this flavor of the projection operator 
method is used is because, after making approximations, 
it has the potential to be valid also far from equilibrium. 
Both the Mori equations and the Zwanzig equations are 
formally exact. The Zwanzig approach, however, allows 
for better approximations in the non-equilibrium situa- 
tions. It is therefore probably a better starting point to 
develop non-equilibrium thermodynamics. 

Also when applying formal projection operator tech- 
niques in the non-linear case one can, in principle, use 
any "relevant" distribution to define the projection op- 
erators. The subtlety we want to point out here is that 
only specific properties of the distribution, namely, in- 
variance with respect to the microscopic dynamics, allow 
one to derive the non-linear Langevin equation of a use- 
ful form. In classical mechanics the Liouville measure 
is always invariant. This, therefore is the most obvious 
choice to use, and defines entropy. Note that this choice 
is not restrictive since the development of the equation 



is still strictly a formal decomposition. No modeling as- 
sumptions are needed to arrive at this result. The fact 
that one is quite restricted in the choice of a sensible 
"relevant" distribution, in fact that it is necessarily an 
invariant distribution (or, rather, measure), is the main 
observation of this paper. 

In early accounts of projection-operator formalism by 
Zwanzig [3] he uses the same microcanonical distribution 
that follows from the arguments presented here. And 
sometimes it is still applied, e.g., in Q. In later deriva- 
tions,especially those that make use of a Hilbert space, 
e -g-> [J)l5|> a freedom to impose an equilibrium or relevant 
distribution is suggested. 

In this paper we will circumvent, as much as possi- 
ble mathematical constructs such as densities in phase 
space and vectors in Hilbert space. The underlying phi- 
losophy is that an ensemble is a mathematical construct 
that should be used with caution. A system is in "re- 
ality" only in one microstate. By using a description in 
terms of densities one is tempted to see this as something 
"real", e.g., to view the ensemble as the state of a sys- 
tem. By making approximations on the level of phase 
space densities, or on Hilbert space vectors, one can con- 
struct objects that have little to do with reality. The fact 
that projection operator formalism has so many flavors 
is mainly due to choices made at this abstract level. To 
stay as close to reality as possible we will try to remain 
at the level of equations of observables (quantities) and 
not ensembles or vectors in Hilbert space. 

For the sake of completeness this paper will start out 
with giving the derivation of the generalized non-linear 
Langevin equation. The derivation will put emphasis on 
the role of the invariant measure and appearance of an 
entropy related to this. The reason is that an impor- 
tant goal of the current paper is to present entropy as 
something that arises naturally in the course of deriving 
the non-linear generalized Langevin equation. One does 
not need independent statistical mechanical reasoning to 
pose a "relevant" probability distribution. One needs no 
qualitative arguments, or connections to quantum me- 
chanics, to justify the form of the entropy expression. 

The definition of the entropy that arises is very sim- 
ilar to the Boltzmann-Planck-Einstein definitions, but 
there is a subtle difference. In the definition found here 
exp[S'(A)] is the number density of microstates corre- 
sponding to a macrostate. So, when one considers a vol- 
ume SVx as a small volume around macrostate X then, 
in sloppy notation, 

SV r = exp[S(X)] 5V X (1) 

is the volume (Liouville measure) of the corresponding 
region in microscopic phase space. In this paper the 
Boltzmann constant ks will be put to 1. The entropy 
that arises in the "Einstein distribution" is exactly the 
entropy as defined here. 

This definition raises several questions. Firstly, en- 
tropy seems badly defined since the ratio of volumes is not 
dimensionless, so one is taking the logarithm of a dimen- 
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sional number. Therefore entropy transforms strangely 
upon change of dimensions (a constant should be added). 
Related to this, entropy is not a scalar quantity. Upon a 
coordinate transformation of the macroscopic space, en- 
tropy changes in a non-trivial way. We will argue that 
this is just the way it is and everything works out fine. 

The projection operator formalism suggests that one 
can define an entropy for any set of macroscopic vari- 
ables. A modern formalism for non-equilibrium thermo- 
dynamics, very much related to the current approach, 
is the GENERIC formalism [1, [M EI- This formal- 
ism should be derivable from the generalized non-linear 
Langevin equation after some controlled modeling as- 
sumptions. An attempt to proof the formalism on the 
basis of projection operator formalism can be found in 
[1; G3 • There the same definition of the entropy is given 
as in Eq. |T|). However, to avoid the conclusion that en- 
tropy does not behave as a scalar upon coordinate trans- 
formation, a preferred coordinate s yst em is introduced 
from the macroscopic state space in [IJ, p. 228]. We will 
argue it is not necessary to introduce a preferred coordi- 
nate system. 

The structure of GENERIC is richer than that found 
by means of the projection operator formalism alone. 
There is a Poisson structure for the reversible part, and 
two extra degeneracy conditions. In [H, [ll|, [l2| this ex- 
tra structure could not be proved, but was argued to 
be very likely. The degeneracy conditions can be directly 
derived from projection operator formalism, but the Pois- 
son structure can not. This will be discussed. 

The entropy expression directly follows from the non- 
linear Langevin equation before any modeling assump- 
tions are made. The usual derivation, in equilibrium 
thermodynamics, of entropy expressions use ergodicity 
arguments. Since we derive entropy in a dynamical set- 
ting ergodicity is never strictly obeyed. This raises ques- 
tions about the relation between entropy and ergodicity. 
This will be discussed near the end of the paper. The ap- 
proach outlined here could be a starting point of applying 
non-equilibrium thermodynamics to systems outside the 
linear regime or outside the thermodynamic limit, i.e., 
for small systems. 



II. THE NONLINEAR LANGEVIN EQUATION 

The derivation of the nonlinear Langevin equation us- 
ing projection operator formalism can be found in many 
standard texts and papers [!, [f|. Nevertheless, we will 
give a straightforward derivation here. The reason for 
giving it is to make this paper self-contained and to be 
able to point out specifically where, and how, the notion 
of entropy enters. 

This derivation of the non-linear Langevin equation 
will start out by closely following that of Kawasaki [l3| . 
We will use T to denote a point in microscopic phase 
space and X for a point in macroscopic phase space. To 
every point in microscopic phase space T one point -X'(r) 



corresponds. This relation is generally not invertible. A 
point in macroscopic phase space corresponds to a whole 
subspace in microscopic phase space. 

For any quantity A(T) the time development is de- 
scribed by means of a Liouville operator £, formally, 

A t = exp[i£t]A . (2) 

Here A t = A(T t ) = A{T t ,0) = A{T ,t), where T is an 
initial point in phase space and T t the state it is evolved 
into at time t. The use of the imaginary i is a widely used 
convention, originating from the quantum mechanics for- 
malism, such that £ is Hermite. It is of no consequence 
here and one can consider the product i£ as one oper- 
ator. The exact form of i£ is of no importance for the 
general derivation. 

In a classical mechanics setting the microscopic evo- 
lution can always be thought of as a trajectory through 
phase space, parameterized by T t . For convenience we 
introduce the following notation for the time-derivative 

A t = j f A(T t ) = j t A(T t ,Q) = §- t A(T ,t) = i£A t . (3) 
A useful operator identity given by Kawasaki [13j is 

— exp[i£ t] = exp[z£ t] iCo 
at 

+ ds exp[i£ s] i£o exp[(i£ — i£q) (t — s)] (i£ — i£o) 
Jo 

+ exjp[(i£-i£ )t](i£-i£ ). (4) 

The proof of this identity is straightforward when realiz- 
ing that the integrand can also be written as 

— ^exp[i£ s] exp[(i£ — i£q) (t — s)] (i£ — i£o)^j ■ (5) 
The identity is valid for any additional Liouville operator 

£ . 

In projection operator formalism, £o in Eq. (j4]), is the 
projected Liouville operator, 

i£ =Vi£. (6) 

Here V is the projection operator. This operator will be 
specified further below. For now we will just perform a 
formal exercise. 

Using Eq. ^ for the macroscopic state X t = A(r t ), 
inserting this into Eq. ^ and using definition Eq. ([5]) 
gives, 

X t = cxp[i£t]Vi£X + 

I ds exp[i£s]Vi£ exp[(l - V) i£ (t - s)} (l-V)i£X 
Jo 

+ exp[(l-V)i£t](l-V)i£X . (7) 

The last rhs term in Eq. ([7]), is used as the definition of 
the fluctuating term 

/ fluct (r , s) = ex P [(l - V) i£s] (1 - V) i£X . (8) 
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This term also arises inside the second integral term. The 
value of / fluct depends on the time s passed since prepa- 
ration in the initial state Tq. The reason why Eq. (jHJ) 
is referred to as the fluctuating term will become more 
apparent below. Using definition Eq. ([8]) , one can rewrite 
Eq. |Zl) as, 

X t = exp[iCt]ViCX + 

[ ds exp[iC a] V iC / fluct (r , t-a) + f uct (T Ql t). (9) 
Jo 

Note that up to this point we only performed a for- 
mal decomposition of the evolution equation and intro- 
duced some definitions. No extra assumptions where in- 
troduced. 



A. The projection operator 

A convenient property of using Liouville operators is 
that one can consider ensembles of (initial) states. An 
ensemble is characterized by a measure or density. Mea- 
sures (loosely speaking, statistical weights) are assigned 
to all microstates. A quantity maps each microscopic 
state to a (finite vector of) real number(s). The mea- 
sure can be used to weigh these values. Mathematically 
the measure characterizing the ensemble is the dual ob- 
ject of a quantity. By using the measure one can map 
the values of a quantity corresponding to an ensemble 
of microstates to one (vector of) real number(s). The 
full mathematical structure of ensembles and measures 
is called a sigma- algebra. 

The space of possible measures can be restricted to rep- 
resent probability measures. During time evolution clas- 
sical mechanics does not destroy the non-negative proper- 
ties corresponding to probability measures. The pairing 
of ensembles and quantities can therefore be interpreted 
as computing expectation values. Note that these ex- 
pectation values are not necessarily related to "reality" . 
Actually, ensembles and the corresponding measures are 
primarily a mathematical tool, because in reality a sys- 
tem is always in one microstate. 

The mathematical structure of measures on ensem- 
bles can be used to decompose the equations of motion 
in a "relevant" and "irrelevant" part. This is done in 
projection-operator formalism and will be outlined be- 
low. The procedure is formal. To derive at a useful 
decomposition, a restriction on the choice of relevant en- 
sembles must be made. 

In the derivation of linear generalized Langevin equa- 
tions one usually starts by introducing a Hilbert space. 
The projection operator is next defined using the inner 
product on this space. For non-linear Langevin equa- 
tions the introduction of a Hilbert space is not neces- 
sary. So, we will not use this approach here. This is 
the point where the current derivation starts to deviate 
from Kawasaki's derivation and becomes more similar to 
the early derivation of Zwanzig |3j. The difference with 



Zwanzig is that we will remain on the level of Langevin 
equations and will not attack the problem via the Fokker- 
Planck side. 

A general (linear) projection operator of a microstate 
r onto its corresponding macrostate X(T) can be imme- 
diately introduced. Operating on a quantity A(T) the 
projection gives 



{VA){T) = J A(T') fi[dT'\X(T') G dX(T)}. 



(10) 



Here we used a measure theoretic notation of the inte- 
gral. The measure used is a conditional measure based 
on the, still to be defined, measure fi. With the notation, 
X(T') G dX(T) we mean the set of F"s such that A(F') 
is in a small, size e, measurable neighborhood around the 
macrostate X(T). We consider the limit e — > 0. We as- 
sume that the conditional measure is well defined such 
that, in the limit e — > 0, the integrals using this con- 
ditional measure of a large class of sufficiently smooth 
functions exists (and the limit is smooth). 

At this point one might suspect we are introducing the 
finite shell model that was argued against in the intro- 
duction. There are two subtleties to be considered here. 
Firstly, indeed, a e-region around macrostate X is con- 
sidered. Depending on the choice of the measure in the 
microscopic space \i the corresponding region in micro- 
scopic phase space looks like a shell that has fixed with or 
is more 'wobbling'. Note that at this point, however, the 
choice of /i is still completely open. Secondly, the condi- 
tional measure remains well defined in the limit e — > 0. 
Therefore the width is not finite. 

The conditional measure is defined as 



fj,[dT\X(T) e dX] ee 



_ [i[dm {r : A(r) g dX}] 



M [{r : X(T) G dX}} 



(11) 



if X(T) G dX, and zero otherwise. It obeys, trivially, the 
property 



J fi[dr'\x(r') g dX(T)] = i. 



(12) 



For Eq. (TIT))) to define a projection operator it should 
leave properties that depend only on the macrostate, 
A(X(T)), unchanged. Due to property Eq. (TT2"]) this re- 
quirement is obeyed. So, Eq. (jTOj) is indeed a projection 
operation. In principle any underlying measure, /i, de- 
fines by means of Eq. (|11[) a projection operator and by 
means of Eq. ([7]) a decomposition corresponding to this 
projection. 

The conditional measure, Eq. (fTTj) , defines a general- 
ized microcanonical ensemble. All microstates, V , con- 
sistent with a macrostate X(T) contribute with a certain 
weight. As alternative notation to Eq. (TIT))) one can write 
it as a conditional expectation value, 

(VA){X) = E{A\X). (13) 

Using the expectation value notation, gives, e.g., that 

exp[i£t]Vi£X a = exp[i£t] E(X\X ) = E(X\X t ), 

(14) 
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This expression should be read as the conditional ex- 
pectation value corresponding to macrostate X t of the 
instantaneous macroscopic phase space velocity, X = 
t'-dX(T')/dV (where X(V) = X t ). Using this notation, 
Eq. ([7]), can be rewritten as 

x t = E(x\x t )+ / ds£(/ flu %i-s)|x s )+/ fluct (r ,t). 

Jo 

(15) 

It is worthwhile to spend some time on the interpre- 
tation of Eq. (fT5")) . By means of the definition of the 
fluctuating term, Eq. ([5]), one knows that, when letting 
the projection operator V act on / fluct , it will give zero, 

E(f uct (;t)\X(T)) = (Vf &uct )(T,t)=0. (16) 

This explains the terminology "fluctuation". Note that, 
to lose the fluctuation term in Eq. (fTSf , one needs to av- 
erage over all possible initial microstates consistent with 
macrostate Xq. Although a microstate is initially consis- 
tent with Xq this does not mean that, at a later time, it is 
still consistent with X t = X(T t ). The expectation values 
arising inside Eq. (|15p are averages over microstates con- 
sistent with attained macrostate X t = X(T t ). The aver- 
aging over initial conditions (consistent with Xq>) there- 
fore gives rise to an averaging over possible values Xt 
evolved from initial microscopic states consistent with 
Xq (but not attained in reality). 

In Eq. (fTS"|) the initial time has a special significance. 
Taking a different time-origin will give a different equa- 
tion. By taking the current time as the time-origin the 
time integral is zero. So, for this special case one can 
write an alternative form of X t 

X t = E(X\X t ) + f tiuct (T t ,0). (17) 

This identity will also be used later on. 



For the proof given below the important property of the 
Lebesgue measure is that it is translational invariant. If 
the smoothness conditions are not met, and the relevant 
measure might be a fractal one, then still a splitting as 
given by Eq. (JTSj) might be possible with a translation in- 
variant fractal measure instead of the Lebesgue measure. 

To relate the conditional measure to the Lebesgue mea- 
sure also the denominator in Eq. (jlip should be related 
to a Lebesgue measure. This is the point at which the 
entropy is introduced. Here, 



M [{T : X(T>) £ dX}] = 
J fi[dT n {r : X{T) £ dX}} = cxp^pO] fj, L (dX), (19) 



if X(T) £ dX and zero otherwise. Note that the Lebesgue 
measure that appears is defined on the macroscopic space 
as opposed to the microscopic one. It arises because, 
if fj, is sufficiently smooth, then the sum of measure of 
two small neighboring (non-overlapping) set equals the 
total measure of the union of these sets. All possible pre- 
factors are incorporated in the entropy definition. One 
can read Eq. (fT9"|) as follows. The factor exp[5(X)] is 
the measure of microscopic phase space per unit volume 
(Lebesgue measure) macroscopic phase space. Note that 
if \x was not sufficiently smooth one still might have been 
able to introduce an entropy if one could find a suit- 
able translational invariant fractal measure on the macro- 
scopic phase space. 

Using the definitions Eq. (fT8)) and Eq. (fT9|) and the 
formal definition of the Dirac delta function one can write 
the conditional measure as 



B. The entropy definition 

Finite classical systems are defined on a phase space 
characterized by the coordinates and momenta of n par- 
ticles, so by 6n real numbers. On a space lZ 6n one can 
define a Lebesgue measure. The Lebesgue measure is 
based on the notion of a volume of hypercubes. Using 
this basic definition it generalized this notion of volume 
to more elaborate sets, namely, members of the sigma- 
algebra. For sufficiently smooth measures \x one can re- 
late this measure to the Lebesgue measure as 

lx(dT) = w(r)fi L (dT). (18) 



H[dT\X(T) e dX] = 

w(T) exp[-S(X)} S[X{T) - X] fj, L (dT). (20) 



This expression is convenient because it clearly separates 
the parts that depend on T directly and others that only 
depend on T through the macrostate, i.e., via X(T). 

In Eq. (fTSf conditional expectation values of the form 
E(A\ X t ) play an important role. Using the form given by 
Eq. ((201) combined with Eq. (H2J) and Eq. (JO]) a second 
representation can be found as 
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Ei.I|.Yi = .<n P [-.s'(.V)] / Mdr)w(T)T.^6[X(T)-X] 



exp{-s(x)} / ML (dr)A(r) 



d 

dX 



= -E[w-'f—-(Tw 



exp[-S(X)} 

d 



dT 



{tw(T)) 5[X{T) X] 



(21) 



H L (dT) X(T) A(T) w(T) S[X(T) - X] 



X 



cxp[-S(X)] A . : , ■ - \ I \ i .V) 



Now comes the main mathematical point of this paper. 
The final expression of Eq. (|2"Tj) consists of two terms. In 
most historic derivations the first term is taken to be zero. 
There are good reasons for doing this, but it is usually 
done without explicit mentioning it. What we want to 
do here is to discuss when this term is zero, and what are 
the consequences for the entropy expression. Clearly the 
term is zero when fj, is an invariant measure, i.e., if 



so 



d_ 

dT 



[tw(T)] = 0. 



(22) 



Depending on the ergodic properties of the dynamics 
there are one or many invariant measures. Independent 
of the detailed dynamics, in classical mechanics, 



w(T) = 1, 



(23) 



is always a valid choice. 

The reason is the Liouville theorem. For a closed classi- 
cal system described by Hamilton equations the Liouville 
theorem, i.e. incompressibility of phase space, holds. In 
mathematical terms this is expressed as d/dT-T = 0. So, 
Eq. ([22]) is always obeyed for constant w. This choice de- 
termines, by means of Eq. (|20p. the conditional measure. 
And consequently, also the entropy, by Eq. (flU]) . 

Taking w to be invariant, the first term in Eq. (|2ip 
cancels, and Eq. (fT5|l becomes 



X t = E(X\X t ) + / ds exp[-S(X a )] 
Jo 

(exp[S(X s )]E(X f Ruct (-,t- s)\X a j) 

+ / fluct (r ,f)- (24) 



dX s 



X t = E{X\X t ) + / ds exp[-5(X s )] 
Jo 

l] (exp[S(X s )] £(/ fluct (-, 0) / fluct (-, t - s)\X s 

+ / fluct (r ,i). (26) 



This is the generalized nonlinear Langevin equation. The 
shape of the equation clearly illustrates the fluctuation- 
dissipation relation. This equation is generally valid. The 
only ingredient in this derivation, besides straightforward 
formal mathematical manipulation, is that for the en- 
tropy definition an invariant measure was used. 

The derivation is valid for general closed systems. Note 
that open systems assume that it is possible to make a 
division between system and environment. This separa- 
tion is always an approximation. In the current setting 
we will (realistically) describe the total system plus envi- 
ronment as a closed system. The full microscopic descrip- 
tion of the whole is assumed to obey Liouville's theorem. 
Therefore, Eq. (|26| is always valid for the whole. In the 
macroscopic description the environment might be mod- 
eled in a very elementary fashion, e.g., as a heat-bath. 
The entropy that appears in Eq. (|26[) is, in the case of 
an open system, the total entropy of the system plus en- 
vironment. This means that what is called entropy in 
the current paper is, for different kinds of environments 
a quantity proportional to, what is usually referred to as 
free-energy or Gibbs energy, available energy or exergy 
etc.. 



III. THE GENERALIZED MICROCANONICAL 
ENSEMBLE AND ENTROPY 



Combining the properties Eq. (fTrJ]) and Eq. (fT7|) one 
finds that 

E(Xf UCt (;t~ S )\X s ) = 

E([E(X\X S ) + / fluct (-, 0)] / fluct (', t - s)\X s ) 

= E(f^ ct (- 1 0)f^(;t-s)\X s ), (25) 



The ensemble given by Eq. ([20]) . with w(T) = 1, is 
a generalized microcanonical distribution corresponding 
to macroscopic state X. This is a straightforward gen- 
eralization of the energy based microcanonical ensemble. 
There are, however, some features that are worthwhile 
pointing out. 

Firstly, by means of Liouville's theorem and to obtain 
a useful generalized nonlinear Langevin equation, w(T) = 
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constant, is the only sensible choice one can make without 
any extra knowledge on the dynamics. 

The energy based ensemble is usually defined by in- 
troducing a finite, but small, shell thickness eo. Because 
of conservation of energy the system remains within the 
shell. Traditionally the Liouville theorem, combined with 
a reasoning on ergodicity, is used to motivate the micro- 
canonical ensemble. It might be worthwhile to note that 
the microcanonical ensemble is often used to determine 
thermodynamic behavior when changing the energy. So, 
the energy is not constant! It is a (slow) dynamic vari- 
able. If one considers a total energy that can change, it is 
difficult to motivate why the same thickness eo should be 
the same for the shells at different E. These kind of con- 
ceptual problems are not present in the current deriva- 
tion. The choice for the entropy definition is such that a 
very inconvenient term in the generalized Langevin equa- 
tion cancels. No ergodic reasoning is used. What remains 
to be shown is that the generalized non-linear Langevin 
equation is useful. 

In the current case X is a dynamic variable. There is 
no a priori assumption about its nature. In the definition 
of the generalized microcanonical distribution, Eq. (|20|) . 
there seems to be a preferred coordinate system (of the 
macroscopic state) introduced because of the definition of 
the delta-function. Because of a kind of the choice w = 1 
the delta-function defines a (infinitesimal) shell with unit 
thickness (per unit dX) in the microscopic phase space. 
One might wonder what the behavior upon coordinate 
transformation X — » Y is. The reason one might, at 
that point, get confused when interpreting, Eq. (|20[) . is 
because one thinks of the entropy as a scalar quantity. As 
can be seen from Eq. (fTO|) , if one considers a smooth one- 
on-one transformation, X — > Y(X), of the macroscopic 
space then, 

exp[SpO] n L {dX) = /i[{r : X(T) e dX}} 



= t ,[{T:Y(X(T))e^.dX}} 



= exp[S(Y(X))} 



det 



dY 
dX 



ti L (dX), (27) 



or 



S(Y) = S(X) - In 



(28) 



This illustrates that upon a change of variables the en- 
tropy does not transform as a variable. 

One can make the choice not to accept this naturally 
arising, non-scalar, scaling behavior. If one wants, for 
some reason, to consider entropy as a scalar quantity then 
there is a preferred "coordinate" system where Eq. I|26p 
holds with entropy S(X). If one now considers another 
parameterization and takes, entropy to be scalar, S(Y) = 
S(X), then Eq. (f2l>)) has extra determinant terms due to 
the coordinate transformation. If one does not want these 
terms to arise then Eq. (|28p should be used. 



The traditional interpretation of exp[S(X)] is the num- 
ber of microscopic states for a macroscopic state X. 
The interpretation in this paper is to view exp[iS(X)] 
as the Liouville measure of microscopic space per unit 
(Lebesgue) volume macroscopic space. This is more eas- 
ily defined (at least in classical theory) because states 
can not be counted, but volume is well defined. The 
mathematics shows that this interpretation also gives the 
simplest equations. The form of the equation for the gen- 
eralized Langevin equation, Eq. (j2l)|) , is independent of 
the chosen parameterization. 



IV. STOCHASTIC DIFFERENTIAL 
EQUATIONS 



The generalized Langevin equation, Eq. (|26p , is a for- 
mal decomposition of the microscopic equations of mo- 
tion. It contains no new information. Full expressions 
of the fluctuating term / fluct (ro, t) are very complicated. 
Its use lies in the fact that it can be used as a starting 
point for approximations. 

Suitable choices for the macroscopic variables X can be 
made. The usual approach is to choose the variables such 
the remainder characterized by / fluct (Fo,i) decorrelates 
quickly. If this is the case the most simple modeling 
assumption for the fluctuation term / fluct (Fo,i) is that 
it is white noise, i.e., a stochastic Gaussian process with 
decorrelation time 0. 

In reality, of course, there is a finite decorrelation time 
r. The modeling assumption is that (complete) decorre- 
lation is very fast, i.e., the change of X is very small on 
the time scale r. One is interested in phenomena on time 
scales much bigger than r. For time scales At 3> r the 
yfluct can k e modeled by means of a Wiener process W , 



rAt 

Jo ' 



f nuc \T ,t)dt 



AW, 



(29) 



where D is a positive definite matrix. A Wiener process 
is a Gaussian stochastic process. Each increment over a 
time-step At has zero average and variance At, 



(AFT) = 0, and (AW <g> AW) = I At. 



(30) 



Increments over non-overlapping time intervals are sta- 
tistically independent. The stochastic term on the rhs 
of Eq. (|29|) should be read using the so-called Ito- 
interpretation (see, e.g., [HI)- This means that the ex- 
pectation value of the increment, averaged over initial 
conditions of Tq is zero. This is a consequence of Eq. (HI 
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When integrating Eq. (|26|) for At one finds 



At 

A A" : = / dtE(X\X t 



f d t 

j ds exp[-S(X a )] — ■ (exp[S(X s )} 



ds'E(f^(.,0)f^(. )S ')\X s )) 



At— s 



A/ 



+ / dtf nuc \T ,t). (31) 
Jo 

Since the fluctuating force is decorrelating quickly one 
finds that the integral 



D(X S 



At-s 



ds'E(f tt ™\;0)f n ™\;s')\X s ), (32) 



does not depend on s, other than through X s , except 
when s = At — 0(t). The diffusion coefficient as defined 
by Eq. can be computed from the variance of the 
fluctuating term. This gives 



D(X ) 



1 

2Al 



At r At 



JO 



ds ds' 



xE(f^(;s)f n ^(;s')\X ). (33) 



Comparing Eq. (|32[) and Eq. (|3"3"|) one sees that both 
quantities are similar, but not exactly the same. 

To be able to establish, rigorously, the usual 
fluctuation-dissipation relation one needs to introduce 
the extra assumption that X t is a slow variable. Dur- 
ing a few decorrelation times t, X t has hardly changed. 
This assumption gives that 

E(f UCt (;0)f Ruct (; S ')\X s ) 

^E(f UCt (;0)f UCt (; S ')\X O ) 

« £(/ fluct (-, a") / fluct (-, s' + s")\X s ). (34) 

for s, s', s" — O(At). Using this time translation invari- 
ance, and computing the expectation value of the vari- 
ance of Eq. , gives 



2D(X ) = D(X )+D T (X ). 



(35) 



So, if D is symmetric both are equal. In the general case, 
however, one can write 



D = D + A, 



(36) 



where A is anti-symmetric, i.e., A T = —A. 

If one imagines A as a vector of values then D is a 
matrix. For suitable chosen macroscopic variables indi- 
vidual elements on both sides of its diagonal are, either, 
symmetric (Djj = Dji) or anti-symmetric (Dij = —Dji). 
Suitably means that a time-reversal operation is well de- 
fined on both the microscopic and macroscopic space. If 



r* is the time reversed microscopic state corresponding 
to T, then the operation X(T*) = X*(T) should make 
sense. Assuming the time-translation invariance as given 
in Eq. Qfflfy one finds that 

£(/ fluct (-,0)/ fluct (-,A s )|X) 

= S(/ fluct (.,-A S )/ fluct (-,0)|X) 

= £(/ flu %As)/ fluct (-,0)|X*) (37) 

= E{f uct (-, 0) / fluct (-, As)\X*) T . 

Depending on the parity of the time-reversal of indi- 
vidual components of the fluctuating contributions, i.e., 
/f uct (X*) = ±/f uct (A), cross correlations of terms with 
opposite parities contribute to the anti-symmetric matrix 
A, all others to D. 

These anti-symmetric contributions correspond to the 
extension of Onsager's reciprocal principle by Casimir 
[lH . Clearly, D can always be decomposed in a symmet- 
ric and antisymmetric part. From Eq. (|35[) follows that 
the symmetric part is directly related to the fluctuations 
in the stochastic differential equation. The Casimir rela- 
tions can be used to that the antisymmetric part is zero 
(or at least to determine its rank). If A is nonzero, it 
makes sense to choose a preferred macroscopic coordi- 
nate system such that A only acts on a small subspace 
of the vector space. 

Under the assumption of rapid decorrelating fluctua- 
tions, the generalized Langevin equation, Eq. (|26[) . can 
be simplified to a stochastic differential equation. First 
one considers, Eq. (|26|) . for At ^S> r. Then use the mod- 
eling assumption that X t is slow, that macroscopic vari- 
ables are chosen such that one can perform time-reversal 
and that the fluctuating term can be modeled as Gaus- 
sian noise. One obtains a stochastic difference equation 
(strictly valid after integration over At 3> t), which can 
be well approximated by the stochastic differential equa- 
tion 

dX t = E{X\X t ) dt + exp[-S] 
d 



dX t 



D exp[5] 



dS 



(38) 



E(X\X t ) dt + D T • dt 
oX t 

+ Vm-dw t . 



Deft 



This stochastic differential equation has three main con- 
tributions a instantaneous (reversible) part, an irre- 
versible (dissipative) contribution and a fluctuating (ran- 
dom) part. The first term on the rhs gives the instanta- 
neous change of X t averaged over all possible microstates 
consistent with this state. The last term models the fluc- 
tuations with respect to this average motion. On time 
scales larger than decorrelation time, r, this is effectively 
modeled by means of a white noise, or Wiener, process. 
The irreversible term gives a drift toward macrostates 
with higher entropy. This bias can be explained intu- 
itively by the argument that these regions correspond to 
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a larger micro-phase-space-volume. Therefore the "resi- 
dence time" is longer. 



V. GENERIC EQUATIONS 

This Langevin equation, Eq. (|38p . is very similar to the 
governing equation of the GENERIC formalism. In fact 
it is the same except the GENERIC formalism imposes 
extra structure on the matrices (or more general oper- 
ators) arising in the formula. The GENERIC equation 
has the form, [111. 



dH 
dX~t 



dS d 
dt + D • — — dt+ — — D dt + V2T> ■ dW t . 
dX t dX t 

(39) 

The most strict assumption of the GENERIC formalism 
is the structure of the reversible part. In the terminol- 
ogy of the formalism it is a two generator equation where 
the generators are energy H and entropy S. The motiva- 
tion to introduce the "Poisson matrix" L and the energy 
H is that this gives a nice structure to the equations. 
In fact, within the GENERIC formalism the reversible 
part obeys Hamilton dynamics, or more general obeys 
the underlying geometric structure, the Poisson struc- 
ture. Within the framework two additional degeneracy 
conditions, that will be discussed below, are also obeyed. 
If we write the microscopic dynamics as 



dH 1 ' 



>(r) 



or 



(40) 



The Poisson structure can be expressed in terms of prop- 
erties of the Poisson matrix as 



= _(L mlcro ) J and L n 



<9L n 



or 



= 0. (41) 



The GENERIC equations can be found if one assumes 
that Hamiltonian of the system can be expressed in terms 
of macroscopic variables, 



H mia \T) = H(X(T)). 



(42) 



In that case, 



E(X\X t 



j ax 



ro dH micro (T) 



or 



x. 



El—- L micro 
r dH 



dX 
dT 



Xi 



(43) 

dX t 



dXt 

It is clear that the coarse-grained Poisson matrix is anti- 
symmetric. In many theories the form Eq. (|42[) is put in 
by design. Usually the energy can be approximated by 
using quantities such as the kinetic energy of the center- 
of-mass of a group of atoms etc.. The remaining energy 
not covered by these contributions are collected into a 
new variable, namely, the internal energy. 



Not all equations are easily put into the GENERIC 
form. The most elementary example is the Brownian 
motion of a particle in a background velocity field. If the 
motion is described by considering position only, i.e., the 
momentum variable is eliminated, then 



dX 



(44) 



It is hard, when not allowed to use a momentum of the 
particle as a variable, to write down its energy. It is clear, 
however, that the expectation value of the instantaneous 
velocity of a Brownian particle is the background fluid 
velocity, E{X\X) = v(X). 

Let's assume that an macroscopic Hamiltonian can be 
defined, e.g., by introducing an internal energy. The sec- 
ond property in Eq. (|4Tj) , that is equivalent to the Jacobi 
property of Poisson brackets, i.e., 



dL 

L -dx=°> 



(45) 



is less trivial to prove in the coarse-grained case. It is 
acknowledged by the people involved in the GENERIC 
movement that the Jacobi identity can not be proved in 
general (yet) [HI, p. 235]. Clearly, the Jacobi- identity 
is obeyed when L is independent of the macroscopic 
state X. This is almost always the case for the classi- 
cal macroscopic transport equations such as the Navier- 
Stokes equation. 

The GENERIC-movement has, however, started a cer- 
tification program, [ll|, to check, among other proper- 
ties, whether macroscopic theories have the full Poisson 
structure. If a macroscopic equation does not have it, 
they claim it is thermodynamically inconsistent. This 
severe judgment seems not fully justified since the Jacobi- 
identity is not proved. An example, where a well estab- 
lished equation of motion was supposedly disproved, is 
the case of the Doi-Edwards reptation model without in- 
dependent alignment [l6l |. This conclusion was disputed 
in [TtJ and defended in [18| . It is hard to see who is 
right and why. We conclude, a bit provocative, with the 
conclusion that for most coarse-graining purposes the rig- 
orous result E(X\X) seems sufficient and the GENERIC 
expression, with its assumptions that need not be obeyed, 
seems to mainly add confusion. 



A. Degeneracy conditions 

Besides the Poisson structure the GENERIC formal- 
ism also prescribes two, so called, degeneracy conditions. 
Before discussing these we will give a degeneracy con- 
dition that follows directly from the projection operator 
formalism. From the current operator formalism imposes 
a restriction on the instantaneous or, reversible, contri- 
bution. Applying Eq. (f2"Tj) for A = 1 (and w invariant) 
gives, 



exp[-S] 



d 
dX~t 



E(X\X t ) exp[5] 



= 0. 



(46) 
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This is the consequence of Liouville's theorem (or more 
generally the existence of an invariant measure) applied 
to the macroscopic space. One might interpret it as the 
fact that E(X\X t ) is "divergence free". Here the volume- 
form defined by the entropy can be used in the divergence 
definition. This is similar to the occurrence of a term 
y/detg, where g is a metric, in Riemannian geometry 
definition of the divergence operator. 
This condition can be rewritten as, 

E ^-8X- = -0X t - E{XlXt) - (4?) 
Within the GENERIC formalism, Eq. this equals, 



L 



dH 
dX~t 



as 

dX t 



d 
dX t 



L dxj 



(48) 



By means of the anti-symmetry of matrix L one finds 
that 



L ^_JL L 
dX t 8X t 



dH 
dX~t 



0. 



(49) 



Since, within the GENERIC formalism, the expression 
for L is independent of H the bracketed expression itself 
equals zero. In the original papers on the GENERIC 
formalism, 0, [l(| , the second term in Eq. (|49|) was not 
included. In the book [ill, P- 233] it is found to be 
present. 

Another instance where Eq. (|47[) arises is in (compu- 
tational) studies where reversible thermostats are used, 
[l9| . In this case the evolution equation of a microscopic 
system is extended by one dissipative term that makes 
sure that the total kinetic energy (iso-kinetic thermostat) 
or the total energy (ergostat) stays constant. This dissi- 
pative term causes Liouville's theorem not to be obeyed. 
Usually arguments connected to the work done upon the 
system related to thermodynamic expression are used to 
derive the entropy production as given by, Eq. (|47[) . 

The point of view in the current paper is that the full 
system, i.e., the microscopic thermostated system plus 
the environment (thermostat and driving force) obeys 
microscopic dynamics in reality. The thermostat and the 
out-of-equilibrium driving force are a model of the envi- 
ronment. This modeled system should be consistent with 
the underlying microscopic dynamics. Since no fluctuat- 
ing forces are incorporated in the reversible thermostat 
model this consistency condition leads to Eq. (|4T|) . 

Note that this point of view also holds for the possible 
anti-symmetric part of D in Eq. (|38|) . When applying 
Eq. (|4lH) to the irreversible term in form of the first line 
in Eq. (|4l)|) . one finds that it is trivially obeyed also for 
this term because 



d 2 



dXdX 



(Aexp[S]) = 0, 



(50) 



due to the anti-symmetry of A. Note that on the level 
of a macroscopic equation, by considering this degener- 
acy condition only, it is hard to distinguish between the 



contribution of the instantaneous change, i.e., E(X\X), 
and the contribution of the anti-symmetric part of the 
entropy driven term. In "GENERIC-speak" this latter 
term is driven by the entropy as generator and not by 
the energy. The possibility of a non-zero anti-symmetric 
part driven by entropy seems to be missing from the 
GENERIC framework. 

The second degeneracy condition in the GENERIC for- 
malism is 



D 



dH 
dX 



0. 



(51) 



This equality ensures that the time derivative of the total 
energy is zero. The requirement is, however, stronger 
than necessarily needed for this purpose. If, Eq. ([42]) 
holds then, because of conservation of energy, 



d H 

= iCH=(iCX)-—, 



(52) 



By means of the definition of the fluctuating force, 
Eq. (EJ), and the definition of D, Eq. ([33]), the degen- 
eracy condition Eq. (|5ip can be proved. 



VI. NON-EQUILIBRIUM THERMODYNAMICS 

The goal of non-equilibrium thermodynamics is to sup- 
ply a description of the time-evolution of a system in 
terms of coarse-grained, meso- or macroscopic, variables. 
The generalized non-linear Langevin equation, after ap- 
proximation for the fluctuating forces, supplies such a 
description. 

In case of the derivation of the stochastic differen- 
tial equation, Eq. (|38|) . the approximations are simple. 
Fluctuations are micro-reversible and they decorrelate 
quickly. The motion in microscopic phase space is re- 
versible. Because, in macroscopic space the volume of 
microscopic phase space corresponding to a unit volume 
macroscopic space needs not be constant the result is 
a bias. The only reason is the mapping. This bias is 
toward macroscopic states corresponding to more micro 
phase space volume (per unit macro phase space volume) . 
This is exactly into accordance with the ordinary reason- 
ing why systems tend toward increasing entropy. The 
generalized nonlinear Langevin equation quantifies this 
tendency. Therefore equilibrium thermodynamics results 
from it. 



A. Ergodicity and decorrelation 

Ergodicity is no requirement for Eq. (f2T)|) to be valid. 
The generalized Langevin equation is formally equiva- 
lent to the microscopic dynamics. Ergodicity arguments 
come into play if one wants to approximate Eq. (|26|) . For 
example, if one wants to model the fluctuating forces by 
means of stochastic processes. 
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The property that there is one unique invariant prob- 
ability measure, defined via Eq. (j2"2"|) , consistent with the 
dynamics of a system is called ergodicity (roughly speak- 
ing). Let's call the probability distribution that defines 
this measure /i C rg- Under, certain mild assumptions on 
the quantity A(T), the Birkhoff ergodic theorem, can be 
derived 

km - f A(T t )dt= [ A(r) Meig (dT). (53) 

This means long time averages are always equal to ensem- 
ble averages using p eTg . If a measure is unique, a single 
trajectory connects all points in phase space (except, pos- 
sibly, for a set with measure 0). Eq. (|22|) transports the 
w(Tq) defined at an initial point r to all other points in 
phase space. 

In the case of classical mechanics there is always one 
invariant measure, namely the Liouville measure. This 
is usually not a probability measure since it can not be 
normalized. Because of the conservation of energy (and 
other quantities) microscopic trajectories are restricted 
to constant (total) energy surfaces. Conservation of en- 
ergy causes that space can be decomposed in (dynam- 
ically) not-connected shells. In this sense the Liouville 
measure is not unique. 

Note that this non-uniqueness of the invariant measure 
should not be taken too serious. As remarked earlier, also 
for the microcanonical system one is often interested in 
the change of the system when the energy is changes. In 
this case the energy is not fixed, but a very slow variable, 
so the shells are, in fact, connected. 

Usually, in classical mechanics, when discussing ergod- 
icity one considers the ergodicity properties on a (total) 
energy shell. This energy shell is then divided into a sys- 
tem and a heatbath. The heatbath is usually taken to 
be very large. Due to its largeness most of the (accessi- 
ble) microscopic phase space corresponding to it can not 
be sampled within a finite time. Moreover, it is clearly 
not realistic to model such a large system, say a lab- 
oratory with people walking around, as described by a 
microcanonical ensemble. 

Note that the validity of Eq. (|26| does not depend on 
ergodicity, but its usefulness might in some way depend 
on it. Non- uniqueness of the invariant measure is due to 
the possibility of decomposing microscopic phase space 
into invariant subspaces. A trajectory starting in such a 
subspace will always remain in it. If the variables A are 
chosen such that they can not parameterize the invariant 
subspaces there might be a lasting dependence of the 
initial microscopic state a system starts in. 

For example, for the same macrostate A the microstate 
r might be in subspace A or B and will stay there in- 
definitely. In the two subspaces the decorrelation might 
occur differently. If this is the case it will show up in 
the dynamics of X but can not be modeled on the level 
of A. In this case certain components of / t fluct do not 
decorrelate at all. There is a lasting dependence on the 
initial microstate. 



This non-ergodic behavior is the most extreme case. 
In a dynamical situation, depending on which scale one 
is looking, there is little difference in not decorrelating 
or decorrelating very slowly. In a dynamic theory (local) 
equilibrium is only an approximation. What is impor- 
tant, for the usefulness of Eq. (|26p in devising approxi- 
mate equations, is the decorrelation behavior of the fluc- 
tuation terms / t ct . The ideal choice of variables A is 
such that decorrelation of / t fluct occurs on time-scales r 
that is small compared to time scales on which A changes 
significantly. If the microstates remain for a long time in 
or near subspaces, and these subspaces are not described 
well by the macroscopic variables A, then one might see 
a breakdown of fluctuation dissipation relations. 

If the subspaces A and B are dynamically discon- 
nected, or far apart, there is a (long) lasting dependence 
(of the microstate) on the initial microstate, but this 
might not be apparent on the macro level at all. For 
typical initial states in cither of the two subspaces, corre- 
sponding to the same A, might decorrelation in a similar 
way. In this case the dependence does not show up in 
the correlation behavior of the fluctuations and is of no 
importance for the evolution at the macroscopic level. 

If this is the case the (ensemble) expectation value of 
the correlation of / t fluct appearing in Eq. will be very 
close to the time average starting in any microstate con- 
sistent with macrostate A t _ s . This means that fluctua- 
tions obey the fluctuation-dissipation relation. 

B. Consistent microstates 

For the usual many particle systems the general 
Birkhoff ergodicity theorem is of little use since a ex- 
tremely long time T is needed to sample microscopic mea- 
sure /Lterg the sufficiently accurate. In general, however, 
one is interested in knowing average values of macro- 
scopic observable, or dynamics of macroscopic states. If 
many, possibly widely separated microstates, correspond 
to the same macrostate one does not necessarily need to 
sample the full microscopic space to sample the macro- 
scopic space well. Generally, for big systems, one only 
needs to sample a tiny amounts of microscopic phase 
space. To illustrate this point let's make a digression. 

As an example let's consider a box of box V, filled 
with N ideal gas particles and energy E. According to 
the entropy definition in this paper one has 

N N 

exp[S(E,N,V)] = Y{dT l 5{E-—Y,pl) 

i=1 i=i (54) 

_ V N (2TrmE) 3 r 
~ T(3A/2) E ' 

Note that in this expression no Planck constant occurs 
and no factor 1/N\ that makes the entropy extensive. 
Here E and N are in principle dynamical variables, but 
because of the conservation of energy and number of par- 
ticles they will, in fact, not change. 
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Now consider space divided into m cells with vol- 
umes Vi. The macroscopic state in each cell is char- 
acterized by the energy Ei and the number of parti- 
cles iVj. The entropy of the macroscopic system char- 
acterized by E = (Ei, ■ ■ ■ , E m ), N = (N x , ■ ■ ■ , N m ), 
V = (Ei, ■ ■ ■ , Em). The total entropy of the macroscopic 
system characterized by these variables is 

exp[S(E,N,V)} = — —/[[expiSfaNiM)]. 

(55) 

Here the multinomial factor did arise because classical 
particles are distinguishable! The entropy as calculated 
for one cell Eq. ([5^)1 considers N specific particles. 

If one puts the particles labeled 1 to Ni into cell 1, 
labels Ni + 1, • • • , N\ + N 2 into cell 2 etc. and computes 
microscopic phase space around energy states E—l to E m 
one finds the product of exp[S(Ei, N i} Vi)]'s in Eq. (f55|) . 
At this point one, however, only computed one out of 
the N\/(N± \ ■ ■ ■ N m l) permutations. To compute all mi- 
croscopic phase space volume consistent with the macro- 
scopic state one has to multiply the contribution of a 
specific configuration with the number of permutations. 
Clearly, if a system is large, in any reasonable time only a 
small portion of all possible permutations will be visited 
by the path through microscopic phase space. For com- 
putation of expectation values this is no problem if the 
macroscopic state depends on the number of particles in 
each cell, but not on which specific particle is where. 

The reasoning why a factor 1/iVj! arises in Eq. ([55]) is 
different from the orthodox reasoning. Usually one rea- 
sons that it occurs because particles are indistinguish- 
able. The factor is then introduced in Eq. l[53]l. because 
one would otherwise be counting one microscopic state 
multiple times. The current argument that all permu- 
tations of particles as distributed over the cells, consis- 
tent with the occupations, should be counted can also be 
found in [^[chapt. V]. 

The conventional approach resolves the Gibbs paradox. 
In classical mechanics the notion of indistinguishability 
is, however, not well established. Are red and green par- 
ticles distinguishable for a color blind person? And what 
about properties we are currently, but possibly not in the 
future, blind for? These kinds of questions on the conven- 
tional approach are addressed in [2l| . The approach used 
is that, when computing the entropy, one distinguished 
between red and green when the number of red and green 
particles in a cell are macroscopic variables. This gives 
a combinatorial factor that distinguishes between colors. 
If one chooses to take color into account in the macro- 
scopic description the difference is irrelevant. The choice 
of macroscopic variables determines the entropy. Clearly, 
if one does not take into account macroscopic variables 
that are relevant this does show up as a lasting depen- 
dence on an initial microscopic state. Of course, in prac- 
tice, one chooses variables that one observes or measures 
in the phenomena one wants to describe. 

This interpretation of the factorial factor in the en- 



tropy definition also gives a seemingly conceptual diffi- 
culty. We accept now that particles are distinguishable. 
All microscopic states corresponding to a macroscopic 
state are taken into account by the entropy definition. 
This means that for two particles far apart, the micro- 
scopic state corresponding to the interchanging of the two 
will never be reached in a reasonable macroscopic time. 
Nevertheless both microstates contribute to the entropy. 
Note, however, that the projection operator formalism 
does not demand this kind of "ergodic" properties. The 
only thing that is really important that, when interchang- 
ing the particles, fluctuations decay in a similar way. This 
is for sure the case, if this interchange leaves the micro- 
scopic Hamiltonian invariant. 

Another objection one might have is that one does not 
want to divide a system in cells. One just wants to con- 
sider one system. However, if the number of particles can 
change (without chemical reactions) one needs to model 
an open system. An open system consists of 2 "cells" the 
system and the environment. 

In general, to compute the entropy, one should take 
into account all microstates consistent with a macrostate 
and not only the states actually sampled. The ergodic 
point of view that entropy has to do with phase space 
visited in a certain time is wrong. 

An illustrative example for this is the entropy of a, 
high molecular weight, entangled polymer melt. Upon 
deformation the polymer chains get stretched (on aver- 
age). Subsequently the polymer conformations will try 
to relax towards equilibrium. Initially this relaxation is 
quick but soon polymer molecules will start feeling their 
neighbors. Because the melt is entangled with them the 
fast modes of relaxations are halted. According to the 
theory of Doi and Edwards [22] conformations will be 
confined to a tube-like region. The contour-length and 
the cross sectional area of the tube is independent of the 
deformation. A polymer can only relax further by escap- 
ing the tube (so-called reptation). So, there is a two step 
process of relaxation, namely, a fast process of the chain 
inside the tube and a slow one of the tube itself. There 
is a big gap between the characteristic time scales. 

Here comes the point. Suppose after a step-strain and 
subsequent fast relaxation inside the tube one character- 
izes the state by the strain. One want to now the entropy 
as a function of the strain. Since the tubes are not re- 
laxed yet one might think the entropy can be computed 
from the number chain conformations (or, rather, micro- 
scopic phase space volume) sampled by a chain inside a 
tube. Since the contour length and radius of the tube 
does not change with deformation one finds this phase 
space volume is independent of strain. The mistake is 
that, in fact, also the number of tube configurations, con- 
sistent with the strain-deformation should be taken into 
account, even if they are not sampled ergodically. 

Ergodic-reasoning on the number of micro-states sam- 
pled gives the incorrect result. The correct result is found 
when considering all states consistent with the macro- 
scopic description. 
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C. Extensivity 

For systems that are totally independent the entropy 
is an additive quantity, since the volume of phase space 
corresponding to the total system is the volumes of the 
individual systems multiplied. For systems that interact 
weakly this is still the case, if the macroscopic quantities 
arc quantities of these subsystems, say the energy of the 
systems. 

A special situation occurs when the systems can inter- 
change particles, because the possible microstates con- 
sistent with an occupation numbers, N, of cells in space 
increases by an immense factor due to all possible permu- 
tations, see Eq. (|55[) . The additivity rule is maintained 
when using S(Ei,Ni, Vi) - In Nil 

We will discuss the case where the variable is total en- 
ergy, but it is valid for any (conserved or non-conserved) 
extensive additive quantity. If one wants to know the 
entropy as function of the total energy of all subsystems 
combined the systems are no longer independent. The 
energy that leaves one subsystem has to enter another 
one. The total entropy for this situation is computed 
in appendix [AJ The total microscopic phase space per 
unit total energy is given by Eq. (| A10|) . The entropy to 
leading order in the number of subsystems M is then 



S(E tot ) = M 



S(E)-±\n 



-S"(E) 
2tt 



o(M), (56) 



where S(E) is the entropy of a subsystem at mean energy 
E = E tot /M. The small o-notation is used to indicate 
weaker than linear terms. Possibly surprising for some is 
that even in the leading order of M a second term, besides 
S(E), is present. This term is negligible if the subsystem 
itself is already macroscopic, but otherwise not. It is 
instructive to consider the case of the ideal gas entropy 
dependence on E. For a subsystems with N particles, 
where according to Eq. ([54]), S(E) = {3N - 2)/21n_E 
(plus constants independent of E) . Inserting this relation 
into Eq. Ij5"6"]) gives 
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In EL, 



Here the terms that are left out are independent of the 
energy. If one would assume that the total entropy is 
simple M S(E) one would only find the first term in the 
first line of Eq. (|57p . If a subsystem with a small number 
of particles, N, was chosen then, in the limit of large M, 
one would find the wrong value. 

When describing macroscopic systems one usually 
makes use of densities. For large enough, homogeneous, 
systems entropy becomes to a good approximation ex- 
tensive, 



Here the entropy density, as discussed above, can not 
be exactly identified with the entropy of a subsystem. 
The conceptual approach to macroscopic equations is to 
consider macroscopically large spatial "cells" . These vol- 
umes contain a large number of weakly interacting sub- 
systems. If macroscopic quantities change little from one 
cell to the next, one can approximately describe this by 
continuously varying (density) fields. 

The primary quantities, however, are still the values 
of total (or averaged) quantities inside the underlying 
cells of macroscopic magnitude. If X is the macroscopic 
quantity under consideration in the cell, one can define 
a local density x = X/M. According to the coordinate 
transformation rule, Eq. (|28]) , combined with Eq. (|58|) . 



S ce u(x) = 5 con (A) + In M = M s(x) + o(M). (59) 

The important thing to notice is that here still, M is 
present. The reason is that although one considers a 
density it characterizes the cell consisting of M subsys- 
tems. Contrary to common believe in the transition to 
densities one does not lose the (sub) system size depen- 
dence. Often, when treating the thermodynamic limit, 
e.g., in large deviation theory, one calls s(x) instead of 
S(X), the entropy. 

If system size M is large enough, it can be thought 
of consisting of independent (weakly dependent) subsys- 
tems. The M subsystems that constitute a cell fluctuate 
independently. Therefore, the fluctuations for, e.g., a 
density x — X tot /M scales as, 



Doc —. 

M 



S(X) = M s{X/M) + o(M). 



(58) 



(60) 



If one considers Eq. I|38p the product D ■ dS/dx will be- 
come independent of M. The divergence term of D and 
the fluctuating term will vanish for M — > oo. In the case 
of macroscopic equations one implicitly uses the follow- 
ing reasoning. First one assumes that quantities vary 
slowly such that one can consider a "discretization" in 
macroscopically large cells. Because M is large the fluc- 
tuating term (and the divergence term) in the stochastic 
differential equation can be neglected and one obtains a 
ordinary differential equation. Because this equation is 
independent of the cell size (no M dependence) one can 
introduce smooth fields and take M — > 0. Although one 
is now considering infinitesimal small cells one must keep 
in mind the route taken to arrive here. 

The interest of the author is mainly in simulations on 
a mesoscopic level. For many applications on the macro- 
scopic level one can assume an extensive entropy. When 
considering smaller scales fluctuations will start to play a 
role. Going from the macroscopic level to the mesoscopic 
one, initially fluctuations will scale as Eq. ([6H)l and the 
entropy can still taken to be extensive. At still smaller 
scales the extensivity breaks down. Here one needs to 
resort to the microcanonical definition of entropy, or at 
least take non-extensive contributions (such as interfacial 
interaction) into account. 
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D. Other entropy expressions 

It is well accepted that the microcanonical ensemble is 
more elementary than the macrocanonical one. There is 
more debate on the entropy definitions. Some researchers 
believe the microcanonical definition is elementary, oth- 
ers think microcanonical and macrocanonical are on the 
same footing (and therefore entropy is only well defined in 
the thermodynamic limit), still others have a preference 
for Gibbs entropy (or information theoretical Shannon- 
Jaynes entropy). 

To a certain extend this seems a matter of personal 
preference, since ensembles are equivalent. This equiva- 
lence is, however, only the case when the thermodynamic 
limit is valid. Even then, equivalence can only be estab- 
lished if certain requirements are obeyed. Mainly inter- 
actions have to be short ranged with a finite attractive 
part. The equivalence of ensembles in the thermody- 
namic limit, and the requirements the potential has to 
obey, is well established [3, 0] . 

The modern approach to the equivalence of ensembles 
in the thermodynamic limit is the theory of large devi- 
ations In unsophisticated terms the basic premises 
is that (for short range interactions), large systems can 
be divided into subsystems that interact only weakly. In 
this case one can do statistics and count the number of 
systems that are in a certain state. 

One of the candidates for the for the fundamental defi- 
nition of (thermodynamic) entropy is the Gibbs entropy, 
i.e., the integral of — plogp. For example, in the informa- 
tion theoretic (MaxEnt) approach to classical mechanics, 
[24| . uses this entropy as starting point. The case for the 
information-theoretic entropy definition on the basis of 
a measure of uncertainty, [25| are quite strong. The in- 
formation entropy is often defined in a axiomatic way, 
[25l [2^ | . Implicit in the definition, [25| , is that entropy 
is additive. This can be shown to follows from the fact 
that entropy maximizes uncertainty constraint by prior 
information, [26| . The main assumption is, therefore, 
the Baysian probability interpretation combined with a 
maximization procedure. 

In the case of continuous distributions only relative 
(Gibbs) entropy, i.e., with respect to a specified mea- 
sure, can be rigorously defined. Therefore one first needs 
to establish the origin of this measure. In classical me- 
chanics this measure is the Liouville measure. So, prior 
to be able to use the Gibbs or Shannon- Jaynes entropy 
one needs to argue why this measure can be used by, e.g., 
a reasoning based on ergodicity. 

If one uses a rigid constraint (such as total energy 
fixed) one finds from the MaxEnt approach the micro- 
canonical ensemble 41] . Therefore the microcanonical en- 
tropy is sometimes said to be a special case of the Gibbs 
entropy. This is not such a strong argument because al- 
ready the Liouville measure and the constraint to the 
iso-energy shell are put in as ingredients. Usually the 
information-entropy is maximized using expectation val- 
ues, such as average energies, as constraints. When using 



an expectation value as constraint for a conserved quan- 
tity as the energy one implicitly states that the system 
is open. 

When trying to define the Gibbs entropy in more phys- 
ical terms one, inevitably, ends up with deriving infor- 
mation entropy as a limit of a multinomial distribution. 
(Even Jaynes does this, see [13] [p- 351]) In appendix 
CB1 such a derivation is given. The main assumption is 
that there are many weakly interacting subsystems. The 
density of states p should be interpreted as counting the 
number of subsystems in a certain state. This is also 
the way relative entropy appears in the theory of large 
deviations [23[. The total entropy for, e.g., constrained 
total energy, can be computed by a functional integra- 
tion over all possible ensembles, p. The maximization 
principle arises because of a saddle-point approximation 
to this integral. 

Defining the Gibbs entropy for the instantaneous 
phase-space distribution p(T) for an isolated system is 
without foundation. Therefore the well known fact that, 
as a consequence of Liouville's theorem, the Gibbs en- 
tropy is a constant of motion poses no paradox. Some- 
times a coarse-grained Gibbs entropy is defined to cir- 
cumvent this perceived paradox. Instead of the distribu- 
tion p, following from the classical Liouville equation, a 
smoothed one p is used. The motivation for this might 
be a quantum mechanical reasoning that volume of ele- 
mentary cells need to be larger than h 3N . The smoothing 
causes diffusion in phase space which gives rise to an in- 
crease in Gibbs entropy. This coarse graining procedure 
is not well defined and rather ad-hoc. A critique on the 
approach can be found in, e.g., [28j ] . 

Note that often in entropy expressions in, e.g., mean- 
field theories some terms are called Gibbs-entropies 
which are strictly speaking not. Instead of a density that 
counts the number of subsystems there is a density that 
counts the number of particles in a region in space. In 
that case —plogp arises from the approximation of the 
multinomial in Eq. (|55p 42]. This kind of entropy, that 
really is just an ideal gas entropy, is also used in the 
-ff-entropy defined for the Boltzmann equation for dilute 
gasses. 

We conclude that the Gibbs-entropy is a very useful 
tool for use in extensive systems. It does not provide a 
fundamental definition of entropy, but follows from it if 
certain requirements are obeyed. If systems are to small, 
clearly not extensive, fluctuations play a large role and 
the maximization procedure is not a good approximation. 

In the thermodynamic limit, under the considerations 
as given by [l|, the thermodynamic entropy always will 
be a concave function of energy. For finite systems there 
might be a "convex" intruder. In this case the total en- 
tropy of system plus heat-bath will be bimodal for some 
temperature interval. In this case there is no complete 
equivalence between the microcanonical ensemble and 
the macrocanonical ensemble, |3(| ■ This equivalence 
can break down completely if interactions are long ranged 
such as in the case of gravity. In this case one needs to 
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resort to the microcanonical ensemble itself, or at least to 
approximations better than the canonical ensemble (e.g., 
the Gaussian or generalized canonical ensemble [3ll ]) 

Another "fundamental" entropy definition based on 
the microcanonical ensemble can be found in literature, 
e.g., in [H, The textbook references are [H, [35| 

Here, the ordinary thermodynamics entropy is defined as 
the logarithm of all phase space volume for states with 
energies smaller than E, 

exp[S(E,V)] = J dre[E-H(T;V)], (61) 

where 0[ ] is the Heaviside step- function. The main rea- 
son why one would prefer this definition is that, also for 
finite systems, 

dS = ^dE - ^dV, (62) 

For the entropy based on the microcanonical ensemble, 
Eq. (|19[) . the relation only holds in the thermodynamic 
limit (see [361 ) for the equivalence in the thermodynamic 
limit). The thermodynamic relation Eq. (|62|) assumes 
entropy is an additive quantity. Since this is only rig- 
orously valid in the thermodynamic limit the objection 
that Eq. (fT9|) does not obey Eq. (|62"|) is not very seri- 
ous. Moreover it is hard to see why states with energies 
that are not attained by the system should be important 
for dynamics. Moreover generalizations of Eq. (|61[) for 
other variables than E give troubles if this variable is 
not bounded from below, such as the energy. 

In his work on small systems, [37], Hill discusses the 
thermodynamics of small systems. Clearly, extensivity is 
not valid and therefore the Gibbs-Duhem relation breaks 
down. As a tool he introduces a new variable, Af, the 
number of identical small systems. Identical means that 
the small systems are all characterized a the same set of 
the "extensive" variables (e.g., energy) and intensive vari- 
ables (e.g., chemical potential and pressure). For small 
systems there is a difference if variables, such as energy 
are fixed, or allowed to fluctuate. In his treatment this 
is, however, not the case. He asserts that still thermody- 
namic relations such as, Eq. (|62|) . are valid. Therefore in 
his treatment, there is not a difference between systems 
characterized by E = E t /Af, (i.e., total energy divided 
by the number of independent ensembles), and non fluc- 
tuating energy E, which is the same each small systems 
in the ensemble of Af members. 

In a digression on statistical mechanics Hill uses a 
Gibbs-entropy definition. As discussed above, for suf- 
ficiently small systems, taking the entropy to be equal to 
the maximum of Gibbs entropy, is not a valid assump- 
tion. This can be defended for open systems if Af is large. 
In one case the total entropy, St, for an ensemble of, e.g., 
systems with all the same energies, E, is computed. Com- 
pared to an entropy that depends on E, many possible 
states, such as system 1 with energy Ex and system 2 
with energy E2, etc., are left out of consideration. Nev- 
ertheless, it is implicitly assumed that both entropies are 
the same. 



VII. CONCLUSIONS 

In this paper we showed that a rigorous definition of en- 
tropy follows from the derivation of generalized Langevin 
equation using projection operator formalism. This is a 
purely formal procedure. The only physical input to ob- 
tain Eq. (|26|) is Liouville's theorem. The entropy def- 
inition is close to the Boltzmann definition. The sub- 
tle difference is that the exponent of the entropy is not 
the number of states per macrostate, but the volume of 
microstates per unit macrostate- volume. Entropy can 
be fully defined within a classical mechanics framework 
without the appearance of any paradoxes that need quan- 
tum mechanical reasoning to resolve. 

The entropy definition follows from a projection onto 
coarse-grained variables of the Liouville equation describ- 
ing the dynamics of the system. No equilibrium, or er- 
godic, reasoning is used to define entropy. There is a 
straight, deductive, route from microscopic dynamics to 
(non-)equilibrium thermodynamics. Entropy is in some 
sense subjective since it depends on the choice of vari- 
ables to describe a system. When one speaks about the 
entropy in the setting of equilibrium thermodynamics one 
means a very specific one, namely the entropy as function 
of energy and volume of a system (that interacts weakly 
with its environment). For describing different phenom- 
ena one can choose to compute a entropy as function of 
different quantities. Entropy has also a objective quality 
since it refers to microscopic phase space volume in a well 
defined way once the macroscopic variables are fixed. 

The notion of entropy is always related to dynamic 
variables. The reason one wants to know entropy, e.g., as 
a function of energy, is that one wants to be able to make 
predictions about heat-fluxes, i.e., change of energy. It 
makes no sense to discuss total entropy of a closed system 
as function of total energy, for example in the case of the 
universe. Entropy only becomes a useful notion if one 
divides the system into subsystems and characterizing 
each subsystem by macroscopic variables. 

Entropy as defined in this paper is not a scalar quan- 
tity. So upon a change of variables extra terms appear. 
In the thermodynamic limit these terms are negligible. 
It has, however, consequences for small systems. In this 
case the current entropy definition deviates from other 
ones such as the Gibbs entropy. Because of the rigor- 
ous connection through Zwanzig projection operator for- 
malism with microscopic dynamics the current entropy 
definition is proved to be the correct one to use. More- 
over, if one approximates the governing equation with a 
stochastic differential equation the transformation rule, 
Eq. ([2"51) , is essential. Only when allowing the entropy 
to transform in this way the form of the equation does 
not change upon a change of coordinates, as follows from 
Ito-calculus. 

In the thermodynamic limit a Gibbs entropy definition 
can be deduced from the more fundamental entropy def- 
inition given here. This Gibbs entropy implies a Max- 
Ent procedure to compute coarse(r) grained entropies 
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from the Gibbs entropy. The procedure consists of two 
steps, one is determine a (constrained) maximum of the 
Gibbs entropy. The second step is integration around 
this saddle-point (in the complex plane). The integral 
gives a factor whose variation is negligible only in the 
thermodynamic limit. 

The Langevin equation poses no restriction on the set 
of variables one uses to describe a system. The choice 
should be motivated by the problem at hand. What de- 
termines a good choice is decorrelation behavior of fluc- 
tuations of the macroscopic variables. If they decorrelate 
quickly the formal generalized Langevin equation can be 
approximated by a practically useful (stochastic) equa- 
tions. 

The current entropy definition is independent of (local) 
equilibrium assumptions. It is therefore suited for non- 
equilibrium modeling. One should not put too much em- 
phasis on ergodicity reasoning. If fluctuations decorrelate 
quickly and in such a way that fluctuation-dissipation re- 
lations are found to be obeyed, then it is no problem that 
only a small part of the microscopic phase space corre- 
sponding to a macroscopic state is sampled. It can easily 
be shown that many microscopic states that contribute 
to the entropy are not approached, even remotely, even if 
one waits a very long time. Here the notion of an equiv- 
alence class, i.e., fluctuations behave similarly in this re- 
mote corner of phase space, is important. 

Entropy does not measure the (logarithm of) states 
sampled. It measures all phase space volume consistent 
with a macroscopic space. This means that much of this 
microscopic phase space that are accounted for in the 
entropy might actually not be sampled. In the Langevin 
equation differences of entropy are the driving force. This 
corresponds to a ratio of volumes. The fact that a part 
of phase-space is not sampled is not important if the mo- 
tion in the (dynamically) disjointed regions are typical 
or equivalent. The ratio denotes how much phase space 
opens up if the (macro)system evolves in a certain direc- 
tion. If this ratio is the same for all equivalent disjointed 
regions it's okay. 

Only taking into account of phase space, in some way, 
sampled in a characteristic time can lead to erroneous 
results. The number of disjointed regions (if this can be 
defined at all), might depend on the macroscopic state. 
Regions can therefore open up to multiple regions, or 
regions might merge upon a change of the macroscopic 
state. 

The most common approximate modeling for fluctua- 
tions is to describe them as white noise. The equation 
that arises is very close to those in the GENERIC for- 
malism [ll|, but slightly more general. One difference is 
that the entropy is more rigorously defined in the current 
paper. GENERIC also imposes a more strict structure 
on the reversible part of the stochastic differential equa- 
tion that arises. The reversible part has the energy as a 
"generator" . To arrive at this result one needs to make 
extra approximations, e.g., introduce an internal energy. 
Sometimes these approximations are hard to make for 



the macroscopic variables chosen, e.g., in the case of a 
Brownian particle. Also the assumed Poisson structure 
of the reversible part remains unproven. The GENERIC 
structure also seems to miss a possible anti-symmetric 
part driven by entropy differences that can arise due 
to Casimir anti-symmetries. The degeneracy conditions 
that are assumed to hold in the GENERIC framework 
where proved to follow from the properties of the gener- 
alized non-linear Langevin equation. 

The approach used in this paper agrees with the "typi- 
cality" point of view in [38, 39]. Upon the coarse-graining 
the equations of motion generate typical paths through 
the macroscopic phase space. The entropy needed is the 
Boltzmann-Planck entropy since it quantifies this typi- 
cality. Motion toward states corresponding to a larger 
microscopic phase space volume (Liouville measure) are 
biased because a microscopic path will typically move in 
this direction. 

In conclusion, the entropy definition deduced in this 
paper is a definition that is generally valid also outside 
the thermodynamic limit and in far from equilibrium sit- 
uations. 



APPENDIX A: ROUTES TO AN EXTENSIVE 
ENTROPY 

Let's consider a system divided into many identical 
subsystems. The subsystems interacts so weakly that, to 
a good approximation, the total energy can be written as 
a sum of energies. The goal of this appendix is to com- 
pute "the entropy" of such a system. We will present only 
an outline. For (mathematical and physical) subtleties 
see 0, [|J . The purpose is to show the entropy definition 
as discussed in this paper in action and show intercon- 
nections with alternative entropy expression (which are 
strictly valid only in the thermodynamic limit). 

The systems evolve almost independently (not fully be- 
cause they exchange energy). Let F = (Li, • • • , Fm) be 
the microscopic state of the full system and Tj the mi- 
crostates of the subsystems. The subsystems will interact 
with their neighbors but this interaction is weak. There- 
fore to a good approximation the total energy is 

M 

E tot (T) = ^2E(T j ). (Al) 

i=i 

We want to compute the entropy of the total system as 
function of the total energy, i.e., 

exp[S{E tot )} = J dT5[E(T) - E tot ] 

r (A2) 

3 

Next we introduce a Fourier representation for the Dirac 
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delta-function, 



then 



exp[S(£tot)] = 
1 f°° f 
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(A3) 

Let us introduce a sum of states 



Z(/3)= / dfexp[-/3£(f)], 



(A4) 



here /3 can be a complex number. For finite systems, 
with E(T) well behaved Z(JS) is analytic everywhere on 
Re(/3) > 0. Besides this Z(fi) for real /3 > will be 
always positive. Assuming that Z{0) is analytic on the 
plane Re(/3) > and on this plane decays rapid enough 
when |/3| — ► oo one can change the path of integration 
from running along the imaginary axis to run along j3—i k 
Using definition Eq. (IA4j) in Eq. (1A2|) gives 

exp[S(£ tot )] = 

i r°° r 

— dk \\ dTj exp[z k ^ E(Tj) - i k E tot ] 

i r°° 

= — / dkexp[M lnZ(0- ik) + {B - ik)E tot ]. 

27r J-oo 

(A5) 

Here (3 is still free to choose. A particular convenient 
choice is, 
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(M \nZ(P)+(3E tot ) = Q. 



This is a saddle-point condition for the term in the ex- 
ponent of Eq. (1A5|) . The /3 sa( j thus found is the inverse 
temperature. One can perform a Taylor expansion up to 
second order around the saddle point. This gives 
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At a few points in the computation we could have made 
the decision to first compute the entropy for the subsys- 
tems, 



exp[S(E)} = / dTS[E(f)-E], 



(A8) 
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exp[S(£ tot )] » J \[dE j exp S^Ej-Et, 

3 3 

(A9) 

Taylor expanding S(Ej) around the average energy E = 
Et t/M up to second order gives 

exp[S(£tot)] « exp[MS(£)] / JJdSj 

:5"(£)x;(f? J --s) a ]«[x;(^---B)' 

exp[M5(^)]M-i(^i)^. (A10) 



APPENDIX B: RELATIVE ENTROPY 

The notion of relative entropy, arises naturally if one 
considers a large ensemble of independent (sub)systems. 
Let's consider M subsystems, where each of the systems 
can be in any one of the discrete states i with measure 
2/j. Since the systems are independent one can define a 
product measure (hyper- volume). The measure of system 
1 to be in state i\, system 2 in state %2, etc. for M systems 
is 



M 



(prod) _ 



n 

3=1 



(Bl) 



Alternatively one can count the number of system that 
are all in state i. Let this number be Mj. Of course 
Mi = M. So, the total product measure of having 
Mi subsystems in state 1, Mi in 2 etc., 



(A6) i/fo^Afi,--- ,M n ) = 



Ml 



ii(^) 



A/, 



Mi!-.-M„! . , 

Z— 1 

= exp[-M ^ ^ In(ft/i/<) + o(M)], 

(B2) 

where = Mj/Af. Here the Stirling approximation is 
used for the factorials in the multinomial. If one wants 
to compute the total entropy, say as function of the total 
energy Etot, one finds 



^ 8[E-Y, MiEi\ 



(B3) 



exppCEtot)] - ^ 

J2 { Mj=M i 

X i/ prod) (Mi, • • • ,M„) 

£\ M j= M i 

x expfMS'reiC/Ji, • • • ,p„) + o(M)] 
Here the quantity, 



S ie \(pi,--- ,p n ) = -y^iin^i/vj), (B4) 
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is called the relative entropy. The relative entropy gives, 
to leading order, the exponent in Eq. (|B3[) . Maximiza- 
tion of the relative entropy, subject to certain constraints, 
such as Et t/M = E = J p(dT) E(T), can be used in for 
a saddle-point approximation of Eq. (|B3|) . 

The relative entropy can be straightforwardly gener- 
alized to the continuous case. If the state are taken to 
be macrostates the measure will be the measure of the 
underlying microscopic space, i.e., exp[S(X)]p,L(dX), so 



Now one can compute an entropy as function of the oc- 
cupations M p , i.e., the number of subsystems in a state 
near T p . The multinomial can be approximated using 
the relative entropy. One can perform a saddle-point ap- 
proximation to compute the entropy as function of the 
total energy. This involves maximization of the relative 
entropy with a constraint on the average energy. 



SUP) = - / p(dX)lii(p(dX)/ f i L (dX))+ J p(dX)S(X). 

(B5) 

In this point of view, Gibbs-entropy is a special case of 
relative entropy. For the microscopic space entropy is 
zero, the measure is the Liouville measure, i.e., pL(dT). 

Note that from the viewpoint discussed here p(dT) = 
p(T) dT is proportional to the number of systems near 
microstate T. The subsystems are not isolated. They 
can, in the example above, exchange energy. Therefore 
defining a Gibbs-entropy for a phase space density, p(T), 
of the ensemble of an isolated system evolving exactly 
according to a fixed Hamiltonian, can not be justified. 

Relative entropy always arises as an approximation of 
a multinomial. In the derivation of the extensive entropy 
in appendix [X] one could have made a change of variables 
and use relative entropy. If on writes the integral over T 
in Eq. (|A3|) as a Riemann sum one finds 



dT exp[ikE(f)] 



M 



Af exp[ikE(t p ) 
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